Hate speech recognition in multilingual text: hinglish documents

نویسندگان

چکیده

The Internet is a boon for mankind but its misuse has been increasing drastically. Social networking platforms such as Facebook, Twitter and Instagram play predominant role in expressing views by the users. Sometimes users wield abusive or inflammatory language, that may provoke readers. This paper aims to evaluate various machine learning deep techniques detect hate speech on social media Hinglish (English-Hindi code-mix) language. In this paper, we apply several methods, along with feature extraction word-embedding techniques, consolidated dataset of 20600 instances, detection from tweets comments Hinglish. experimental results reveal models perform better than general. Among models, CNN-BiLSTM model word2vec word embedding provides best results. yields 0.876 accuracy, 0.830 precision, 0.840 recall 0.835 F1-score. These surpass recent state-of-art approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Character-Based Handwritten Text Recognition of Multilingual Documents

An effective approach to transcribe handwritten text documents is to follow a sequential interactive approach. During the supervision phase, user corrections are incorporated into the system through an ongoing retraining process. In the case of multilingual documents with a high percentage of out-of-vocabulary (OOV) words, two principal issues arise. On the one hand, a minor yet important matte...

متن کامل

Multilingual Speech Recognition

We present two concepts for systems with language identification in the context of multilingual information retrieval dialogs. The first one has an explicit module for language identification. It is based on training a common codebook for all the languages and integrating over the output probabilities of language specific –gram models trained over the codebook sequences. The system can decide f...

متن کامل

Multilingual Speech Recognition

The speech-to-speech translation system Verbmobil requires a multilingual setting. This consists of recognition engines in the three languages German, English and Japanese that run in one common framework together with a language identification component which is able to switch between these recognizers. This article describes the challenges of multilingual speech recognition and presents diffe...

متن کامل

Clustering multilingual documents by estimating text - to - text semantic relatedness

This thesis is about multilingual document clustering through estimating semantic relatedness between multilingual texts. Specifically we focus on the task of clustering multilingual documents with very limited or no supervisory information. We present two approaches to address the problem : a comparable-corpora based approach and a web-searches based approach. Our first approach derives pairwi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International journal of information technology

سال: 2023

ISSN: ['2511-2112', '2511-2104']

DOI: https://doi.org/10.1007/s41870-023-01211-z